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April 1991 ACM SIGMOD Record , Proceedings of the 1991 ACM SIGMOD international 

conference on Management of data SIGMOD '91, volume 20 issue 2 
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Publisher: ACM Press 
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The performance of a database system commonly deteriorates due to degradation of the 
database's physical data structure. The structure degradation is a consequence of the 
normal operations of a general database management system. When system performance 
has degraded below acceptable limits the database must be reorganized. In conventional, 
periodic reorganization the database, or part of it, is taken off line while the data 
structure is being reorganized. This paper pr ... 
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20 Issue 2 
Publisher: ACM Press 

Full text available: "gpdgl 18 MB) Additional Information: full citation , abstract , references , citings, i ndex 

As computer system main memories get larger and processor cycles-per-instruction 
(CPIs) get smaller, the time spent in handling translation lookaside buffer (TLB) misses 
could become a performance bottleneck. We explore relieving this bottleneck by (a) 
increasing the page size and (b) supporting two page sizes. We discuss how to build a TLB 
to support two page sizes and examine both alternatives experimentally with a dozen 
uniprogrammed, user-mode traces for the SPARC architectur ... 
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Application restructuring and performa nce p ortability on shared virtual nfiemo r y and 
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The performance portability of parallel programs across a wide range of emerging 
coherent shared address space systems is not well understood. Programs that run well on 
efficient, hardware cache-coherent systems often do not perform well on less optimal or 
more commodity-based communication architectures. This paper studies this issue of 
performance portability, with the commodity communication architecture of interest being 
page-grained shared virtual memory. We begin with applications that per ... 

Locality preserving dicti onarie s: the ory & application to clustering in da tabases 
Vijayshankar Raman 

May 1999 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium 
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On the cost of monitoring and reorganization of object bases for clustering 
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September 1996 ACM SIGMOD Record, volume 25 issue 3 

Publisher: ACM Press 

Full text available: ^ pdf(606.93 KB) Additional Information: full citation , abstract , citin gs, index terms 

Clustering is one of the most effective means to enhance the performance of object base 
applications. Consequently, many proposals exist for algorithms computing good object 
placements depending on the application profile. However, in an effective object base 
reorganization tool the clustering algorithm is only one constituent. In this paper, we 
report on our object base reorganization tool that covers all stages of reorganizing the 
objects: the application profile is determined by a monito ... 
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Patrick C. Fischer, Robert L. Probert 

July 1979 Communications of the ACM, volume 22 issue 1 

Publisher: ACM Press 
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® Query evaluation techniques for large databases 
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June 1993 ACM Computing Surveys (CSUR), Volume 25 issue 2 
Publisher: ACM Press 

Full text available* ISj pdf(9 37 MB) Additional Information: full citation, abs tract, refe rence s, citings, index 
■ 4 Jerms, review 

Database management systems will continue to manage large data volumes. Thus, 
efficient algorithms for accessing and manipulating large sets and sequences will be 
required to provide acceptable performance. The advent of object-oriented and extensible 
database systems will not solve this problem. On the contrary, modern data models 
exacerbate the problem: In order to manipulate large sets of complex objects as 
efficiently as today's database systems manipulate simple records, query-processi ... 

Keywords: complex query evaluation plans, dynamic query evaluation plans, extensible 
database systems, iterators, object-oriented database systems, operator model of 
parallelization, parallel algorithms, relational database systems, set-matching algorithms, 
sort-hash duality 
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11 Code management: Dynamic code mana g ement: improvin g whole p rog ram code 

^ localit y in mana g ed runtimes 

^ Xianglong Huang, Brian T Lewis, Kathryn S McKinley 

June 2006 Proceedings of the 2nd international conference on Virtual execution 
environments VEE '06 

Publisher: ACM Press 

Full text available: pdf ( 153>03 KB) Additional Information: full citation , abstract , references , index terms 

Poor code locality degrades application performance by Increasing memory stalls due to 
instruction cache and TLB misses. This problem is particularly an issue for large server 
applications written in languages such as Java and C# that provide just-in-time (JIT) 
compilation, dynamic class loading, and dynamic recompilation. However, managed 
runtimes also offer an opportunity to dynamically profile applications and adapt them to 
improve their performance. This paper describes a Dynamic Code Manage ... 

Keywords: code generation, code layout, dynamic optimization, locality, performance 
monitoring, virtual machines 



"1 2 Linear hashin g with separators — a dynamic hashing scheme achieving one-access 
Per-Ake Larson 

September 1988 ACM Transactions on Database Systems (TODS), volume i3 issue 3 
Publisher: ACM Press 

Full text available- 1?1 pdf(1 62 MB) Additional Infonmation: full citation , abstract, re ferences , citings, index 

~'~ terms, review 

A new dynamic hashing scheme is presented. Its most outstanding feature is that any 
record can be retrieved in exactly one disk access. This is achieved by using a small 
amount of supplemental internal storage that stores enough information to uniquely 
determine the current location of any record. The amount of internal storage required is 
small: typically one byte for each page of the file. The necessary address computation, 



insertion, and expansion algoritiinns are presented and the perform ... 

13 Data page layouts fo r relational databases on deep memory hierarchies 
Anastassia AilamakI, David J. DeWitt, Mark D. Hill 

November 2002 The VLDB Journal — The International Journal on Very Large Data 

Bases, volume 11 Issue 3 
Publisher: Springer-Verlag New York, Inc. 

Full text available: "glpdff 5 93.86 KB) Additional Information: full citation , a bstrac t, index terms 

Relational database systems have traditionally optinnized for I/O perfornnance and 
organized records sequentially on disk pages using the N-ary Storage Model (NSM) 
(a.k.a., slotted pages). Recent research, however, indicates that cache utilization and 
performance is becoming increasingly important on modern platforms. In this paper, we 
first demonstrate that in-page data placement is the key to high cache performance and 
that NSM exhibits low cache utilization on modern platforms. Next, we ... 

Keywords: Cache-conscious database systems. Disk page layout. Relational data 
placement 



14 Empirical working set behavior 
^ Juan Rodriguez-Rosell 

September 1973 Communications of the ACM, volume 16 issue 9 

Publisher: ACM Press 

Full text available: pdf(4 57.79 KB ) Additional Information: full citation , abstract , references, citings 

The working set model for program behavior has been proposed in recent years as a basis 
for the design of scheduling and paging algorithms. Although the words ''working set" are 
now commonly encountered in the literature dealing with resource allocation, there is a 
dearth of published data on program working set behavior. It is the purpose of this paper 
to present empirical data from actual program measurements, in the hope that workers in 
the field might find experimental eviden ... 

Keywords: paging, program behavior, software measurement, virtual memory, working 
set 



1 5 An improved network cl uster! ng method for l/Q -effi cient q uery processin g 
Sung-Ho Woo, Sung-Bong Yang 

November 2000 Proceedings of the 8th ACM international symposium on Advances in 

geographic information systems 
Publisher: ACM Press 

Full text available: 'g| pdf(651.16 KB) Additional Infonnation: full citation , abstract , index terms 

Efficient network query processing is extremely important in Geographical Information 
Systems (GIS) and Intelligent Transportation Systems (ITS) which include various 
applications of transportation, utility and communication networks, etc. In order to reduce 
the I/O cost in network query processing a given network should be stored with high disk- 
space utilization and a low edge-cut ratio. To do so the nodes in the network should be 
clustered in such a way that each cluster fits in a disk page ... 

16 E I i_m i n a tj ng t h e ad dress tran s I ation bottleneck for ph ysical address cache 
>Av Tzi-cker Chiueh, Randy H. Katz 

^ September 1992 ACM SIGPLAN Notices , Proceedings of the fifth international 

conference on Architectural support for programming languages and 
operating systems ASPLOS-V, volume 27 issue 9 
Publisher: ACM Press 
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17 Res e arch sessio ns: im plementation techniques: Fractal prefetching B- -Trees: 

^ o ptimizing both cache and disk performance 

Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin 

June 2002 Proceedings of the 2002 ACM SIGMOD international conference on 

Management of data SIGMOD '02 
Publisher: ACM Press 

Full text available- 'PI pdf(1 49 MB) Additional Information: full citation , abstract, references , citings, index 
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B*-Trees have been traditionally optimized, for I/O performance with disk pages as tree 
nodes. Recently, researchers have proposed new types of -Trees optimized for CPU 
cache performance in main memory environments, where the tree node sizes are one or a 
few cache lines. Unfortunately, due primarily to this large discrepancy in optimal node 
sizes, existing disk-optimized B+-Trees suffer from poor cache performance while cache- 
optimized B+ -Trees exhibi ... 



18 Modelin g on-line rebalancin g with priorities and executin g on parallel database 
s ystems 
Daniel C. Zillo 

November 1996 Proceedings of the 1996 conference of the Centre for Advanced 
Studies on Collaborative research 

Publisher: IBM Press 

Full text available: '^pdf( 213.63 KB ) Additional Information: full citation , abstract , references , index terms 

Because changes to the database (DB) and workload occur during a DB system's lifetime, 
the physical DB design must evolve to sustain good performance. These changes are 
carried out by on-line reorganizations which access the DB and execute concurrently with 
the DB workload. Different performance intrusions are placed on the workload when a 
reorganization is assigned different priorities compared to the workload processes. Our 
work studies the effects of the reorganization priority- level on perfo ... 



19 Physical storage struct ures: The K-D-B-tree: a search structure for lar ge 
nnultidimensiona I d ynamic indexes 
John T. Robinson 

April 1981 Proceedings of the 1981 ACM SIGMOD international conference on 
Management of data SIGMOD '81 

Publisher: ACM Press 

Full text available: '^pdf (723.91 KB) Additional Information: full citation , abstract , references , citings 

The problem of retrieving multikey records via range queries from a large, dynamic index 
is considered. By large It is meant that most of the index must be stored on secondary 
memory. By dynamic it is meant that Insertions and deletions are intermixed with queries, 
so that the index cannot be built beforehand. A new data structure, the K-D-B-tree, is 
presented as a solution to this problem. K-D-B-trees combine properties of K-D-trees and 
B-trees. It is expected that the mult ... 




2® Dyn amic hashin g schemes 
^ R. J. Enbody, H. C. Du 

July 1988 ACM Computing Surveys (CSUR), volume 20 issue 2 

Publisher: ACM Press 

Additional Information: full citation , abstract , references , citin gs, index 



Full text available: pdf( 2.52 MB) 

terms 

A new type of dynamic file access called dynamic hashing has recently emerged. It 
promises the flexibility of handling dynamic files while preserving the fast access times 
expected from hashing. Such a fast, dynamic file access scheme is needed to support 
modern database systems. This paper surveys dynamic file access scheme is needed to 
support modern database systems. This paper surveys dynamic hashing schemes and 
examines their critical design issues. 



Keywords: dynamic hashing 



1 - 20 of 200 Result page: 12345678910 next 

The ACIVI Portal is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc. 
Terms of Usa ge Privacy Policy Code of Ethics Contact Us 

Useful downloads: Adobe Acrobat QuickTime @ Windows Media Pla yer ^ Real Pla ver 



a PC^RTAL 



USPTO 



S ubscribe (Full Service) Re gister (Limited Service, Free) i^ogin 

Search: ^ The ACM Digital Library C The Guide 
[Adaptive record clusteririg page 



Terms used Adaptive record clustering 



Sort results I relevance 

Save results to a Binder 

by I — — 

■ — J Search Tips 
|expan^edJomi_[;:J □ open results in a new 

window 



Feedback Report a problem Satjsfactio.n 
surve y 

Found 12,928 of 184,245 

Try an Advanced Search 

Try this search in The ACM Guide 



Display 
results 



Results 1 - 20 of 200 
Best 200 shown 



Result page: 12345678910 next 

Releva nee scale 0001 



Ada ptive record clusterin g 

C. T. Yu, Cheing-nnel Suen, K. Lam, M. K, Siu 

June 1985 ACM Transactions on Database Systems (TODS), volume lo issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , inde x 
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Full text available: 'p^lpdf f 1.58 MB) 



An algorithm for record clustering is presented. It Is capable of detecting sudden changes 
in users' access patterns and then suggesting an appropriate assignment of records to 
blocks. It is conceptually simple, highly Intuitive, does not need to classify queries into 
types, and avoids collecting individual query statistics. Experimental results indicate that 
it converges rapidly; its performance is about 50 percent better than that of the total sort 
method, and about 100 percent better tha ... 

A parallel alg orithm for record clusteri ng 
Edward Omiecinski, Peter Scheuermann 

December 1990 ACM Transactions on Database Systems (TODS), volume is issue 4 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings, index 
terms , review 



Full text available: "gpdf f 1.82 MB) 



We present an efficient heuristic algorithm for record clustering that can run on a SIMD 
machine. We introduce the P-tree, and its associated numbering scheme, which in the 
split phase allows each processor independently to compute the unique cluster number of 
a record satisfying an arbitrary query. We show that by restricting ourselves in the merge 
phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup 
ratio is optimal In the number of processors used. Final ... 



Ada ptive doc ument clusterin g 
C T. Yu, Y. T. Wang, C. H. Chen 

June 1985 Proceedings of the 8th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available: "gj pdf(465.87 KB) Additional Information: full citation , references , citin g s 



^ Adaetive inform ation system desi g n: one query at a time 
^ C. T. Yu, C. H. Chen 

May 1985 ACM SIGMOD Record , Proceedings of the 1985 ACM SIGMOD international 
conference on Management of data SIGMOD '85, volume 14 issue 4 



Publisher: ACM Press 

Full text available: &Bdf(l£3MB) Additional Information: full citation , references , index temis 



Resea rch articles and surveys: Research issues in automatic database clusterin g 

Sylvaln Gulnepain, Le Gruenwald 

March 2005 ACM SIGMOD Record, Volume 34 issue i 

Publisher: ACM Press 

Full text available: ^ pdf( 1 ,42 MB) Additional Information: fyJIjciLatiQn. abstract, referenc es, indexjemis 

While a lot of work has been published on clustering of data on storage medium, little has 
been done about automating this process. This Is an important area because with data 
proliferation, human attention has become a precious and expensive resource. Our goal is 
to develop an automatic and dynamic database clustering technique that will dynamically 
re-cluster a database with little intervention of a database administrator (DBA) and 
maintain an acceptable query response time at all times. In th ... 

A global approa ch to record clustering and file reorganization 
Edward Omieclnski, Peter Scheuermann 

July 1984 Proceedings of the 7th annual international ACM SIGIR conference on 

Research and development In information retrieval 
Publisher: British Computer Society 

Full text available: ^ pdf(720.73 KB) Additional Information: full citation , abstract , references 

We present an integrated method for record clustering and reorganization which can be 
applied to any set of queries whose frequencies of request are known. The clustering 
algorithm works by splitting and merging current clusters and, furthermore, produces a 
new assignment of these clusters to pages In secondary storage. The reorganization 
algorithm is an on-line, incremental procedure for allocating the records to their new 
physical locations such that the number of pages swapped in and out of t ... 

SBatiaJ Datab ase Cluste ring: Using a cluster mana g er in a spatial database system 
Thomas Brinkhoff 

Novennber 2001 Proceedings of the 9th ACM international symposium on Advances in 
geographic information systems 

Publisher: ACM Press 

Full text available: 'g| pdf( 2.62 MB ) Additional Information: full citation, abstract , references , i ndex terms 

An important goal for a spatial database system Is to minimize the I/O-cost of queries and 
other operations. One essential component to achieve this objective is the buffer 
manager. The placement of the spatial objects on disk pages is another important factor; 
a reasonable clustering of the objects helps to minimize the I/O-cost of queries. However, 
It is a difficult task to define and maintain an efficient clustering. In this paper, a duster 
manager is proposed, which re-clusters spatial obje ... 

iVIBRATE: Interactive visualization-based framework for clustering lar g e datasets 
Keke Chen, Ling Liu 

April 2006 ACM Transactions on Information Systems (TOIS), volume 24 issue 2 
Publisher: ACM Press 

Full text available: "g) pclf (4.48 MB) Additional Information: full citation , abstract , references , index terms 

With continued advances in connnnunication network technology and sensing technology, 
there is astounding growth in the amount of data produced and made available through 
cyberspace. Efficient and high-quality clustering of large datasets continues to be one of 
the most important problems in large-scale data analysis. A commonly used methodology 
for cluster analysis on large datasets is the three-phase framework of 
sampling/summarization, iterative cluster analysis, and disk-labeling. There are th ... 



Keywords: Clustering, interactive visualization, labeling, large datasets, performance 



9 On the perform ance of object clustering techniques 
Manolis M. Tsangaris, Jeffrey F. Naughton 

June 1992 ACM SIGMOD Record , Proceedings of the 1992 ACM SIGMOD international 

conference on Management of data SIGMOD '92, volume 21 issue 2 
Publisher: ACM Press 

Full text available* "ffl pclf(1 20 MB) Additional Infomiation: full citation , abstract , references , citings, index 
^ temis 

We investigate the perfornnance of some of the best-known object clustering algorithms 
on four different workloads based upon the tektronix benchmark. For all four workloads, 
stochastic clustering gave the best performance for a variety of performance metrics. 
Since stochastic clustering is computationally expensive, it is Interesting that for every 
workload there was at least one cheaper clustering algorithm that matched or almost 
matched stochastic clustering. Unfortunately, for each worki ... 




10 Self-adaptive, on-l ine reclusterin q of complex ob j ect data 
^ William J. I^clver, Roger King 

>^ May 1994 ACM SIGMOD Record , Proceedings of the 1994 ACM SIGMOD international 
conference on Management of data SIGMOD '94, volume 23 issue 2 
Publisher: ACM Press 

Full text available- Ddf(1 19 MB) Additional Information: full citation , abstract , references , citings, index 
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A likely trend in the development of future CAD, CASE and office information systems will 
be the use of object-oriented database systems to manage their internal data stores. The 
entities that these applications will retrieve, such as electronic parts and their connections 
or customer service records, are typically large complex objects composed of many 
interconnected heterogeneous objects, not thousands of tuples. These applications may 
exhibit widely shifting usage patterns due to their i ... 



11 Data placement in Bubba 

^ George Copeland, William Alexander, Ellen Boughter, Tom Keller 

June 1988 ACM SIGMOD Record , Proceedings of the 1988 ACM SIGMOD international 

conference on Management of data SIGMOD '88, volume 17 issue 3 
Publisher: ACM Press 

Full text available* "PI pdf(1 41 MB) Additional Information: full citation , abstract , references , citings, index 
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This paper examines the problem of data placement in Bubba, a highly-parallel system for 
data-intensive applications being developed at MCC. ''Highly-parallel" implies that load 
balancing is a critical performance issue. "Data-intensive" means data is so large that 
operations sliould be executed where the data resides. As a result, data placement 
becomes a critical performance issue. In general, determining the optimal placement of 
d ... 



Multi-level hierarchi es for scalable ad hoc routing 
Elizabeth M. Belding-Royer 

September 2003 Wireless Networks, Volume 9 issue 5 
Publisher: Kluwer Academic Publishers 

Full text available: "gj pdf ( 465.16 KB) Additional Information: full citation , abstract , references , index terms 

Ad hoc networl<s have the notable capability of enabling spontaneous networks. These 
networks are self-initializing, self-configuring, and self-maintaining, even though the 
underlying topology is often continually changing. Because research has only begun to 
scratch the surface of the potential applications of this technology, it is important to 
prepare for the widespread use of these networks. In anticipation of their ubiquity, the 
protocols designed for these networks must be scalable. This inc ... 
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13 Opti mizat io n of a hierarchica l file org anization for spelling correction 
Tetsuro Ito, Clement T. Yu 

June 1985 Proceedings of the 8th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available: 'g]pdf(511.7 8 KB ) Additional Information: full citation , abstract , references , citing s 

A spelling program using a hierarchically organized file seems to be promising, since it 
can correct more than common typing mistakes. However, Its speed of detecting spelling 
errors in the inputs is rather slow. Here some techniques of modifying the program to 
improve the speed are presented. 

14 Towards an efficient management of ob j ects in a distributed environment 
A. El Habbash, J. Crimson, C. Horn 

July 1990 Proceedings of the second international symposium on Databases in 
parallel and distributed systems 

Publisher: ACM Press 

Full text available:'^ pdf( 1.01 MB) Additional Information: full citation , refere nces, i ndex terms 




15 Fast obj ect partitionin g usin g Sto chastic learning automata 
^ B. J. Oommen, D. Ma 

November 1987 Proceedings of the 10th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 
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Let &OHgr; = {Al, AW} be a set of W objects to be partitioned into R classes {PI, 
PR}. The objects are accessed in groups of unknown size and the size of these groups 
need not be equal. Additionally, the joint access probabilities of the objects are unknown. 
The intention is that the objects accessed more frequently together are located In the 
same class. This problem has been shown to be ... 

^•6 A framework for effective retrieval 
^ C. T. Yu, W. Meng, S. Park 

June 1989 ACM Transactions on Database Systems (TODS), volume i4 issue 2 
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The aim of an effective retrieval system is to yield high recall and precision (retrieval 
effectiveness). The nonblnary Independence model, which takes into consideration the 
number of occurrences of terms in documents, Is introduced. It is shown to be optimal 
under the assumption that terms are independent. It is verified by experiments to yield 
significant Improvement over the binary Independence model. The nonblnary model is 
extended to normalized vectors and is applicable to more genera ... 
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In relational databases, an attribute of a relation can have only a single primitive value, 



making it cumbersome to model complex objects. The object-oriented paradigm removes 
this difficulty by introducing the notion of nested objects, which allows the value of an 
object attribute to be another object or a set of other objects. This means that a class 
consists of a set of attributes, and the values of the attributes are objects that belong to 
other classes; that is, the definition of a class fo ... 
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Part of the process of data integration is determining which sets of identifiers refer to the 
same real-world entities. In integrating databases found on the Web or obtained by using 
information extraction methods, it is often possible to solve this problem by exploiting 
similarities In the textual names used for objects in different databases. In this paper we 
describe techniques for clustering and matching identifier names that are both scalable 
and adaptive, in the sense that they can ... 
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Most current network intrusion detection systems employ signature-based methods or 
data mining-based methods which rely on labelled training data. This training data is 
typically expensive to produce. Moreover, these methods have difficulty in detecting new 
types of attack. Using unsupervised anomaly detection techniques, however, the system 
can be trained with unlabelled data and is capable of detecting previously "unseen" 
attacks. In this paper, we present a new density-based and grid-based cl ... 
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User traversals on hyperlinks between Web pages can reveal semantic relationships 
between these pages. We use user traversals on hyperlinks as weights to measure 
semantic relationships between Web pages. On the basis of these weights, we propose a 
novel method to put Web pages on a Web site onto different conceptual levels in a link 
hierarchy. We develop a clustering algorithm called PageCluster, which clusters 
conceptually-related pages on each conceptual level of the link hierarchy based on th ... 
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